智能论文笔记

Precise Affordance Annotation for Egocentric Action Video Datasets

Zecheng Yu , Yifei Huang , Ryosuke Furuta , Takuma Yagi , Yusuke Goutsu , Yoichi Sato

分类：计算机视觉

2022-06-11

物体负担是人类对象互动中的一个重要概念，它基于人类运动能力和物体的物理特性提供有关行动可能性的信息，从而使任务受益，例如行动预期和机器人模仿学习。但是，现有数据集通常：1）将负担能力与对象功能混合在一起；2）将负担与目标相关的动作混淆；3）忽略人类运动能力。本文提出了一个有效的注释方案，通过将目标 - 毫无疑问的运动动作和将类型抓住为负担性标签，并引入机械作用的概念来解决这些问题，以表示两个对象之间的动作可能性。我们通过将该方案应用于Epic-Kitchens数据集并通过“负担能力识别”等任务来测试我们的注释，从而提供新的注释。我们定性地验证了接受注释训练的模型可以区分负担能力和机械行动。

translated by 谷歌翻译

Domain Adaptive Hand Keypoint and Pixel Localization in the Wild

Takehiko Ohkawa , Yu-Jhe Li , Qichen Fu , Ryosuke Furuta , Kris M. Kitani , Yoichi Sato

分类：计算机视觉 | 机器学习

2022-03-16

我们的目标是在新的成像条件下（例如，户外）在新的成像条件下（例如，在非常不同的条件下拍摄的图像（例如室内）时（室内），在新成像条件（例如室外）下（例如室外），在新的成像条件下（例如室外）进行分割的像素级掩盖的性能。在现实世界中，重要的是在各种成像条件下进行培训的模型都必须运行。但是，它们被现有标记的手数据集涵盖的变化是有限的。因此，有必要调整在标记的图像（源）上训练的模型，以使其具有看不见的成像条件的未标记图像（目标）。尽管已经为这两项任务开发了自我训练域的适应方法（即以自我监督的方式学习以自我监督的方式学习），但当目标图像的预测嘈杂时，它们的训练可能会降低性能。为了避免这种情况，至关重要的是，在自我训练过程中，为嘈杂的预测分配了较低的重要性（置信度）。在本文中，我们建议利用两个预测的差异来估计目标图像对这两个任务的信心。这些预测来自两个单独的网络，它们的差异有助于确定嘈杂的预测。为了将我们提出的信心估计纳入自我训练中，我们提出了一个教师学生的框架，在该框架中，两个网络（教师）为网络（学生）提供自我培训的监督，并通过知识蒸馏从学生那里学习教师。我们的实验表明，在具有不同照明，握住对象，背景和摄像机观点的适应设置中，其优于最先进的方法。与最新的对抗适应方法相比，我们的方法在HO3D上的多任务得分提高了4％。我们还验证了我们在室外成像条件下快速变化的Ego4d的方法。

translated by 谷歌翻译

Supervised Anomaly Detection Method Combining Generative Adversarial Networks and Three-Dimensional Data in Vehicle Inspections

Yohei Baba , Takuro Hoshi , Ryosuke Mori , Gaurang Gavai

分类：计算机视觉 | 机器学习

2022-12-22

The external visual inspections of rolling stock's underfloor equipment are currently being performed via human visual inspection. In this study, we attempt to partly automate visual inspection by investigating anomaly inspection algorithms that use image processing technology. As the railroad maintenance studies tend to have little anomaly data, unsupervised learning methods are usually preferred for anomaly detection; however, training cost and accuracy is still a challenge. Additionally, a researcher created anomalous images from normal images by adding noise, etc., but the anomalous targeted in this study is the rotation of piping cocks that was difficult to create using noise. Therefore, in this study, we propose a new method that uses style conversion via generative adversarial networks on three-dimensional computer graphics and imitates anomaly images to apply anomaly detection based on supervised learning. The geometry-consistent style conversion model was used to convert the image, and because of this the color and texture of the image were successfully made to imitate the real image while maintaining the anomalous shape. Using the generated anomaly images as supervised data, the anomaly detection model can be easily trained without complex adjustments and successfully detects anomalies.

translated by 谷歌翻译

GraphIX: Graph-based In silico XAI(explainable artificial intelligence) for drug repositioning from biopharmaceutical network

Atsuko Takagi , Mayumi Kamada , Eri Hamatani , Ryosuke Kojima , Yasushi Okuno

分类：机器学习

2022-12-21

Drug repositioning holds great promise because it can reduce the time and cost of new drug development. While drug repositioning can omit various R&D processes, confirming pharmacological effects on biomolecules is essential for application to new diseases. Biomedical explainability in a drug repositioning model can support appropriate insights in subsequent in-depth studies. However, the validity of the XAI methodology is still under debate, and the effectiveness of XAI in drug repositioning prediction applications remains unclear. In this study, we propose GraphIX, an explainable drug repositioning framework using biological networks, and quantitatively evaluate its explainability. GraphIX first learns the network weights and node features using a graph neural network from known drug indication and knowledge graph that consists of three types of nodes (but not given node type information): disease, drug, and protein. Analysis of the post-learning features showed that node types that were not known to the model beforehand are distinguished through the learning process based on the graph structure. From the learned weights and features, GraphIX then predicts the disease-drug association and calculates the contribution values of the nodes located in the neighborhood of the predicted disease and drug. We hypothesized that the neighboring protein node to which the model gave a high contribution is important in understanding the actual pharmacological effects. Quantitative evaluation of the validity of protein nodes' contribution using a real-world database showed that the high contribution proteins shown by GraphIX are reasonable as a mechanism of drug action. GraphIX is a framework for evidence-based drug discovery that can present to users new disease-drug associations and identify the protein important for understanding its pharmacological effects from a large and complex knowledge base.

translated by 谷歌翻译

P2Net: A Post-Processing Network for Refining Semantic Segmentation of LiDAR Point Cloud based on Consistency of Consecutive Frames

Yutaka Momma , Weimin Wang , Edgar Simo-Serra , Satoshi Iizuka , Ryosuke Nakamura , Hiroshi Ishikawa

分类：计算机视觉 | 机器人

2022-12-01

We present a lightweight post-processing method to refine the semantic segmentation results of point cloud sequences. Most existing methods usually segment frame by frame and encounter the inherent ambiguity of the problem: based on a measurement in a single frame, labels are sometimes difficult to predict even for humans. To remedy this problem, we propose to explicitly train a network to refine these results predicted by an existing segmentation method. The network, which we call the P2Net, learns the consistency constraints between coincident points from consecutive frames after registration. We evaluate the proposed post-processing method both qualitatively and quantitatively on the SemanticKITTI dataset that consists of real outdoor scenes. The effectiveness of the proposed method is validated by comparing the results predicted by two representative networks with and without the refinement by the post-processing network. Specifically, qualitative visualization validates the key idea that labels of the points that are difficult to predict can be corrected with P2Net. Quantitatively, overall mIoU is improved from 10.5% to 11.7% for PointNet [1] and from 10.8% to 15.9% for PointNet++ [2].

translated by 谷歌翻译

Surgical Skill Assessment via Video Semantic Aggregation

Zhenqiang Li , Lin Gu , Weimin Wang , Ryosuke Nakamura , Yoichi Sato

分类：计算机视觉

2022-08-04

基于视频的自动化手术技能评估是协助年轻的外科学员，尤其是在资源贫乏地区的一项有前途的任务。现有作品通常诉诸CNN-LSTM联合框架，该框架对LSTM的长期关系建模在空间汇总的短期CNN功能上。但是，这种做法将不可避免地忽略了空间维度中工具，组织和背景等语义概念之间的差异，从而阻碍了随后的时间关系建模。在本文中，我们提出了一个新型的技能评估框架，视频语义聚合（Visa），该框架发现了不同的语义部分，并将它们汇总在时空维度上。语义部分的明确发现提供了一种解释性的可视化，以帮助理解神经网络的决策。它还使我们能够进一步合并辅助信息，例如运动学数据，以改善表示和性能。与最新方法相比，两个数据集的实验显示了签证的竞争力。源代码可在以下网址获得：bit.ly/miccai2022visa。

translated by 谷歌翻译

GOF-TTE: Generative Online Federated Learning Framework for Travel Time Estimation

Zhiwen Zhang , Hongjun Wang , Jiyuan Chen , Zipei Fan , Xuan Song , Ryosuke Shibasaki

分类：机器学习 | 人工智能

2022-07-02

估计路径的旅行时间是智能运输系统的重要主题。它是现实世界应用的基础，例如交通监控，路线计划和出租车派遣。但是，为这样的数据驱动任务构建模型需要大量用户的旅行信息，这与其隐私直接相关，因此不太可能共享。数据所有者之间的非独立和相同分布的（非IID）轨迹数据也使一个预测模型变得极具挑战性，如果我们直接应用联合学习。最后，以前关于旅行时间估算的工作并未考虑道路的实时交通状态，我们认为这可以极大地影响预测。为了应对上述挑战，我们为移动用户组引入GOF-TTE，生成的在线联合学习框架以进行旅行时间估计，这是我）使用联合学习方法，允许在培训时将私人数据保存在客户端设备上，并设计设计和设计。所有客户共享的全球模型作为在线生成模型推断实时道路交通状态。 ii）除了在服务器上共享基本模型外，还针对每个客户调整了一个微调的个性化模型来研究其个人驾驶习惯，从而弥补了本地化全球模型预测的残余错误。％iii）将全球模型设计为所有客户共享的在线生成模型，以推断实时道路交通状态。我们还对我们的框架采用了简单的隐私攻击，并实施了差异隐私机制，以进一步保证隐私安全。最后，我们对Didi Chengdu和Xi'an的两个现实世界公共出租车数据集进行了实验。实验结果证明了我们提出的框架的有效性。

translated by 谷歌翻译

Learning Deep Input-Output Stable Dynamics

Yuji Okamoto , Ryosuke Kojima

分类：机器学习 | 机器人

2022-06-27

从观察到的时间序列数据中学习稳定的动态是机器人技术，物理建模和系统生物学中的重要问题。这些动态中的许多被表示为与外部环境通信的输入输出系统。在这项研究中，我们专注于投入输出稳定系统，表现出对意外刺激和噪声的鲁棒性。我们提出了一种学习保证输入输出稳定性的非线性系统的方法。我们提出的方法利用了满足汉密尔顿 - 雅各比不平等的空间上的可区分投影来实现输入输出稳定性。找到该投影的问题可以作为二次约束二次编程问题，并分析得出特定的解决方案。此外，我们将方法应用于玩具双基生模型以及训练由葡萄糖胰岛素模拟器产生的基准测试的任务。结果表明，通过我们的方法，具有神经网络的非线性系统可以达到输入输出稳定性，这与天真的神经网络不同。我们的代码可在https://github.com/clinfo/deepiostability上找到。

translated by 谷歌翻译

Route to Time and Time to Route: Travel Time Estimation from Sparse Trajectories

Zhiwen Zhang , Hongjun Wang , Zipei Fan , Jiyuan Chen , Xuan Song , Ryosuke Shibasaki

分类：人工智能

2022-06-21

由于物联网（IoT）技术的快速开发，许多在线Web应用程序（例如Google Map和Uber）估计移动设备收集的轨迹数据的旅行时间。但是，实际上，复杂的因素（例如网络通信和能量限制）使以低采样率收集的多个轨迹。在这种情况下，本文旨在解决稀疏场景中的旅行时间估计问题（TTE）和路线恢复问题，这通常会导致旅行时间的不确定标签以及连续采样的GPS点之间的路线。我们将此问题提出为不进行的监督问题，其中训练数据具有粗糙的标签，并共同解决了TTE和路线恢复的任务。我们认为，这两个任务在模型学习过程中彼此互补并保持这种关系：更精确的旅行时间可以使路由更好地推断，从而导致更准确的时间估计）。基于此假设，我们提出了一种EM算法，以替代E估计通过E步中通过弱监督的推断路线的行进时间，并根据M步骤中的估计行进时间来检索途径，以稀疏轨迹。我们对三个现实世界轨迹数据集进行了实验，并证明了该方法的有效性。

translated by 谷歌翻译

Replacing Labeled Real-image Datasets with Auto-generated Contours

Hirokatsu Kataoka , Ryo Hayamizu , Ryosuke Yamada , Kodai Nakashima , Sora Takashima , Xinyu Zhang , Edgar Josafat Martinez-Noriega , Nakamasa Inoue , Rio Yokota

分类：计算机视觉 | 人工智能 | 机器学习

2022-06-18

在目前的工作中，我们表明，公式驱动的监督学习（FDSL）的表现可以匹配甚至超过Imagenet-21K的表现，而无需在视觉预训练期间使用真实的图像，人类和自我选择变压器（VIT）。例如，在ImagEnet-21K上预先训练的VIT-BASE在ImagEnet-1K上进行微调时，在ImagEnet-1K和FDSL上进行微调时显示了81.8％的TOP-1精度，当在相同条件下进行预训练时（图像数量，数量，，图像数量，超参数和时期数）。公式产生的图像避免了隐私/版权问题，标记成本和错误以及真实图像遭受的偏见，因此具有巨大的预训练通用模型的潜力。为了了解合成图像的性能，我们测试了两个假设，即（i）对象轮廓是FDSL数据集中重要的，（ii）创建标签的参数数量增加会影响FDSL预训练的性能改善。为了检验以前的假设，我们构建了一个由简单对象轮廓组合组成的数据集。我们发现该数据集可以匹配分形的性能。对于后一种假设，我们发现增加训练任务的难度通常会导致更好的微调准确性。

translated by 谷歌翻译